Association mapping for protein, total soluble sugars, starch, amylose and chlorophyll content in rice

Background Protein, starch, amylose and total soluble sugars are basic metabolites of seed that influence the eating, cooking and nutritional qualities of rice. Chlorophyll is responsible for the absorption and utilization of the light energy influencing photosynthetic efficiency in rice plant. Mapping of these traits are very important for detection of more number of robust markers for improvement of these traits through molecular breeding approaches. Results A representative panel population was developed by including 120 germplasm lines from the initial shortlisted 274 lines for mapping of the six biochemical traits using 136 microsatellite markers through association mapping. A wide genetic variation was detected for the traits, total protein, starch, amylose, total soluble sugars, chlorophyll a, and chlorophyll b content in the population. Specific allele frequency, gene diversity, informative markers and other diversity parameters obtained from the population indicated the effectiveness of utilization of the population and markers for mapping of these traits. The fixation indices values estimated from the population indicated the existence of linkage disequilibrium for the six traits. The population genetic structure at K = 3 showed correspondence with majority of the members in each group for the six traits. The reported QTL, qProt1, qPC6.2, and qPC8.2 for protein content; qTSS8.1 for total soluble sugar; qAC1.2 for amylose content; qCH2 and qSLCHH for chlorophyll a (Chl. a) while qChl5D for chlorophyll b (Chl. b) were validated in this population. The QTL controlling total protein content qPC1.2; qTSS7.1, qTSS8.2 and qTSS12.1 for total soluble sugars; qSC2.1, qSC2.2, qSC6.1 and qSC11.1 for starch content; qAC11.1, qAC11.2 and qAC11.3 for amylose content; qChla8.1 for Chl. a content and qChlb7.1 and qChlb8.1 for Chl. b identified by both Generalized Linear Model and Mixed Linear Model were detected as novel QTL. The chromosomal regions on chromosome 8 at 234 cM for grain protein content and total soluble sugars and at 363 cM for Chl. a and Chl. b along with the position at 48 cM on chromosome 11 for starch and amylose content are genetic hot spots for these traits. Conclusion The validated, co-localized and the novel QTL detected in this study will be useful for improvement of protein, starch, amylose, total soluble sugars and chlorophyll content in rice. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-022-04015-8.

performs transport and storage of nutrients and many more functions. Protein content also affects the eating and cooking quality of rice [1]. Enhancement of protein content through breeding is effective, economical and reasonably an easier way to combat protein malnutrition [2]. Total soluble sugars (sucrose, glucose and fructose) and starch play important role for signalling; maintain the overall structure and growth of plants and response to the stresses [3,4]. Total soluble sugars (TSS) influence organoleptic quality of seeds and are the key factors for development of fresh and sweet flavours [5]. Rice kernel is rich in carbohydrate which constitute > 80% starch. Protein content in the rice kernel is about 7-8% [6]. Starch profiles of rice are controlled by a complex genetic system (multiple quantitative trait loci). Amylose content (AC) is considered as the indirect index of major physical and chemical attributes of the starch [7]. Starch and protein are basic metabolites of seed that influence the eating and cooking qualities, nutritional qualities and health benefits of grains [8]. Amylose and amylopectin are two different types of starch found in rice endosperm of which amylose content mainly affects the eating and cooking qualities of rice [9]. The percentage of starch, amylose, protein and total soluble sugar content (TSS), are the key determinant biochemical factors which affects seed quality [10].
Higher chlorophyll content in rice varieties produce more dry matter and grain yield than low chlorophyll containing genotypes. Chlorophyll content (CC) is used in rice breeding programs as an effective index for high photosynthetic efficiency [11]. The Chl. a and b content of leaves are the main pigments of photosynthesis in the chloroplasts. They are responsible for the absorption and utilization of the light energy influencing photosynthetic efficiency [12]. Continuous efforts are being paid by rice breeders for improvement of these traits. However, significant and stable improvement has not been achieved due to the role of many genes/quantitative trait loci which are also affected by environment. Many QTL controlling seed protein content in rice grain have been reported from the mapping studies in rice [2,13,14]. Few QTL controlling chlorophyll content have been reported from the genetic analysis studies [15][16][17]. Detection of QTL for controlling the TSS in rice grain have been reported in few publications [18,19]. In addition, very few reports on mapping of starch [20][21][22] and amylose content [23,24] are available in rice.
Association mapping is an effective approach to detect genes/QTL for complex traits with a wide genetic pool through marker-trait association analysis. Naturally occurring variations can be exploited to detect QTL that regulate such traits in rice through association mapping. The study of genetic diversity and structure is helpful to recognize the population behaviour. Population structure (Q) with relative kinship (K) analyses were used to check the panel population composition for linkage disequilibrium (LD) mapping. The marker-trait association based on both the generalized linear model (GLM) and mixed linear model (MLM) were estimated and have been shown to perform better than other model analysis. For easy improvement of eating, cooking, nutritional qualities and chlorophyll content, we need robust molecular markers and also validation the reported QTL for improvement of these traits through marker-assisted breeding. Therefore, this mapping study will provide novel QTL and validation of these reported target QTL including use in marker-assisted breeding. In this study, the main target was to detect the candidate genes/QTL for total protein, total soluble sugars, starch, amylose and chlorophyll content in rice by genotyping with 136 simple sequence repeat (SSR) markers covering all the chromosomes.

Plant materials
A set of 274 diverse rice germplasm lines were collected from Gene Bank of ICAR-NRRI, Cuttack were used in the study (Supplementary Table 1; Fig. 1A). The set was constituted by the germplasm collections from Assam, Madhya Pradesh, Kerala, Odisha and Manipur. For breaking of seed dormancy the harvested seeds were stored for 3 months for the estimation of biochemical traits like total protein content (TP), starch, amylose, total soluble sugars, chlorophyll a and b. A representative panel population was developed by including 120 germplasm lines from the initial shortlisted 274 lines for mapping of the six biochemical traits using 136 microsatellite markers through association mapping.

Phenotyping for biochemical traits and statistical analyses
The chlorophyll a and chlorophyll b content were estimated using the leaf samples of 10 days old seedlings by following the procedure suggested by Arnon [25]. Chl. a and Chl. b were expressed in mg/g fresh wt. leaf. Calibrated Near Infrared Spectroscopy (NIRS) was used to estimate starch (%), amylose (%) and protein (%). The NIR was calibrated following the procedure of Bagchi et al. [26]. Various modified partial least square (mPLSs) models corresponding with the best mathematical treatments were identified for starch, amylose and protein content. A total of 15 g dehusked rice grain sample was taken in a small cup (size: inner diameter 66 mm and height 25 mm) and the above traits were measured in calibrated NIR spectroscopy. TSS content was estimated calorimetrically by the Anthrone method [27] and was expressed in percentage. Cropstat software7.0 was used to estimate critical difference (CD) and coefficient of variation (CV %) in the recorded phenotypic data.

Genomic DNA isolation, PCR analysis and selection of SSR markers
Seeds of panel comprising 120 rice accessions were germinated in the petri plates. After 15 days, leaves were collected and genomic DNA was extracted using CTAB method [28]. The isolated DNA was quantified through gel electrophoresis and PCR analysis was performed using 136 SSR markers covering all the chromosomes (Supplementary Table 2). The reaction conditions were set for denaturation, annealing and extension. The PCR products were separated using 3% agarose. A 50 bp DNA ladder was used to determine the base pair of the amplicons. Electophoresis was performed by running the gel for 4 hr. at 2.5 V/cm and band images were captured using the Gel Documentation System (SynGene). The method for genomic DNA isolation, PCR analysis and selection of SSR marker followed in earlier publications were adopted in this study [29][30][31].

Molecular data analysis
For each genotype-primer combination, amplicons were scored for the presence or absence of the amplified products. The data was entered as discrete variables into a binary data matrix. For each SSR locus, the number of alleles (N), observed heterozygosity (H), major allele frequency (A), expected heterozygosity (He), and polymorphic information content (PIC) were estimated by using Power marker V3.25 [32]. A similarity matrix table was generated from the binary data using Jaccard's coefficients. The cladogram was generated using method of unweighted pair group method arithmatic average (UPGMA) algorithm [33,34] and was visualized by Treeview 32 software [35]. The population structure, cluster analysis and AMOVA were performed using STRU CTU RE 2.3.6, Darwin 5 and GenAlEx 6.5 software, respectively. STRU CTU RE was run with the optimal number of groups (K) varying from 1 to 10, with 10 runs for each K value. To determine the true value of K, adhoc statistic ∆K value was followed [36]. Parameters were set to 1,50,000 burn-in periods and 1,50,000 Markov Chain Monte Carlo (MCMC) replications after burn-in with an admixture and allele frequencies correlated model. The procedures followed for the software used were described in previous publications [37][38][39].

Association analysis
TASSEL 5 software was used to know the marker-trait association of the six biochemical traits. Two statistical models namely, General linear model and Mixed Fig. 1 Frequency distribution of germplasm lines for very high, high, medium, low and very low for chlorophyll a, chlorophyll b, starch, amylose, total protein and total soluble sugars estimated (A) from 274 rice landraces (B) from 120 landraces present in the panel population linear model were used in the TASSEL 5.0 software. The genetic association between phenotypic trait of the rice accessions and SSR makers were determined using the software [40]. Markers which are significantly associated with the traits were identified based on the markers r 2 and p-values. The false discovery rate (FDR) and adjusted pvalues (q values) were also calculated. The false discovery rate (FDR) in the association study were computed following the previous publications [37,41].

Results
Phenotyping for protein, total soluble sugars, starch, amylose and chlorophyll content in the rice germplasm lines A total of 274 rice germplasm lines were phenotyped for protein, total soluble sugars, starch, amylose and chlorophyll content (Supplemental Table 1). A representative population was used as panel population which was developed from the original germplasm lines based on the mean phenotypic values of the six traits. Each trait was classified into different phenotypic groups based on the mean estimates of the traits. Phenotyping results for protein, total soluble sugars, starch, amylose and chlorophyll content of the 274 lines showed clear-cut differences among the genotypes (Supplementary Table 1: Fig. 1A). The frequency distribution of the original population was broadly classified into 5 groups for each of the 6 biochemical traits studied (Fig. 1A). A working panel population was constituted by selecting 120 germplam lines from all the phenotypic groups of each trait ( Table 1). The estimated mean values of the 6 biochemical traits from the panel population also revealed significant variation among the genotypes for each trait (Table 1; Fig. 1B; Fig. 2). Very high value of > 15% grain protein content was detected in the landraces, Bharati and Pk-21. In addition, > 12.5% protein was obtained from the germplasm lines Lalgundi, D1, Mahamaga, Langmanbu, Kartiksal, Jyothi, Adira-1, Adira-3, Chudi, Pondremunduria, Sreyas, Cheruvirippu, Kakchengphou, Ezhoml-2 and Kozhivalan. Mikirahu, Batachudi, Chitapa, Kusumal, Ahimachutki, Ampang, Mikirahu, Noorthipathu, Pandya and Malbar showed very high values for total soluble sugars. Very high starch content of > 95% was observed in the landraces Manavari, Pandya, Badra and Kantakapura. Intermediate amylose content is desirable for consumption, but very high content of about 30% and more was noticed in the landraces Kapanthi, Jaya and Chingforechokua. Very high Chl. a content of > 3 mg/g fresh leaf was noticed in the line Jira, Bilipandya, Gauri, Aujari, Lusai and Malbar. Very high content of > 2 mg/g fresh wt. leaf Chl. b was detected in the germplasm lines Jira, Bilipandya, Aujari, Lusai, Phourrel, Chingphou and Phoaujaarangbele ( Table 1). The genotypes identified may be useful as donor parents for improvement of these traits in future breeding programs.

Genotype-by-trait biplot and correlation analyses
The first two principal components were used to plot the scatter diagram for the 6 biochemical traits in the panel germplasm population of 120 genotypes and genotypeby-trait biplot graph was generated (Fig. 3A). The first and second principal components recorded 34.6 and 28.24 of the total variability with eigen values of 2.079 and 1.694, respectively. Among the 6 biochemical traits, Chl. a showed maximum diversity followed by Chl. b and total protein content based on the principal component analysis of the panel population (Fig. 3A). The PCA diagram distributed the germplasm lines in all the 4 quadrants based on the 6 traits in the genotypes. All the high protein containing germplasm lines were in the quadrant IV (top left). All the high chlorophyll carrying germplasm lines were placed in the quadrants I (top right) and II (bottom right). The genotypes containing high estimates for all the six traits were not seen in any particular germplasm line. Thus, for selection of donor parents for the six traits, we need to select at least 2 germplasm lines as parental line for the improvement of these six traits.
The correlation analysis in the panel population revealed that chlorophyll had a strong positive correlation with chlrophyll b content. A strong positive correlation was observed between the starch and amylose content. In addition, total protein content and total soluble sugars content also showed strong positive correlation in the mapping population. A negative correlation was recorded for starch content with amylose content. In addition, total protein content showed negative correlation with amylose content. A negative association is also observed for chlorophyll content with total protein content (Fig. 3B).

Cluster analysis
Panel containing 120 genotypes were broadly clustered into two groups based on the mean values of the six studied biochemical traits. The smaller cluster accommodated 3 genotypes together as they showed low values for TSS, Chl. a, Chl. b, starch and amylose content. The bigger cluster consists of rest of the 117 genotypes. This cluster was again divided into two sub-clusters, one having 66 and other with 51 genotypes (Fig. 4). The subcluster I included 66 genotypes were grouped together having medium to low and very low mean values for starch content and medium to high and very high values for amylose content. The other sub-cluster II with 51 genotypes was grouped for high to very high starch and amylose content. the sub-cluster I was grouped into two based on starch content, where only one genotype, Liktimachi included with very low starch content and rest 65 genotypes in the other, where the starch content ranges from low to very high. Amylose content had divided the sub-cluster I with 65 genotypes again into two groups with Jaya having very high amylose content form one group and rest 64 genotypes with mean values for amylose content ranging from medium to high only grouped into second one. The group with 64 genotypes were again assembled to give two sub-clusters having 31 (all having similar, i.e. medium values for starch content) and 33 (starch content ranging from medium to low and amylose content ranging from medium to high) genotypes. The sub-cluster II with 51 genotypes were sub-grouped into two: one with TKM10 and Jira, both with high mean values for amylose and starch content and very low values for TP; and other having 49 genotypes showing similarity for starch and amylose content ranging from low to high mean values.
The sub group with 49 genotypes was again grouped into two, as per similarities of starch, amylose and TP.
This gave rise to two groups, one including Manavari and Pandya and other with the rest 47 genotypes. Manavari and Pandya were similar, both having very high-starch and medium values for amylose and TP content. The mass with 47 genotypes ranged high to very high for starch, low to medium for amylose and low -very high for TP values. This group was divided into sub-groups with 22 and 25 genotypes. The cluster with 22 genotypes was similar at a point having similarities for mean values: Chl. b medium to very low, starch-high, amylosemedium and TP-medium to very high. The other one with 25 genotypes were found to have mean values ranging from high to very high for starch and low to medium for amylose.

Assessment of molecular diversity using the SSR markers
Diversity of the panel population was assessed using the estimated diversity parameters by genotyping the population using 136 SSR markers. A total of 506 markers alleles were detected from the population which indicated that the population is diverse (Supplementary Table 3). Also,

Genetic structure analysis
The population genetic structure analyzed by the STRU CTU RE software grouped the panel population into subgroups based on the peak ∆K value at an assumed K value. The highest peak of ∆K value (259.77) obtained at K = 2 and the whole population was divided into two subpopulations ( Supplementary Fig. 1). However, the two subpopulations produced did not correspond well with the six biochemical traits estimated from the panel. Therefore, next ∆K peak value (106.54) at K = 3 was considered for classification of the panel population. The three sub populations obtained based on the ∆K peak that is by genotyping of 136 SSR markers (Fig. 5). The sub populations obtained at K = 3 showed a good correspondence with each of the studied biochemical traits (Supplemental Table 4). The genotypes with ≥80% probability were assigned to the corresponding subpopulation and the rest as admix genotypes. The sub-population 1 accommodated 81 genotypes of which majority were poor and very poor for the target traits. The sub-population 2 included 8 germplasm lines of which majority were with moderate in content of target traits. The sub-population 3 The net nucleotide distance (allele-frequency divergence) of sub-population 1 and sub-population 2 was 0.1704; sub-population 1 and sub-population 3 estimated to be 0.1186 while between 2 and 3 sub-populations was estimated to be 0.2302. The average distance (expected heterozygosity) among the members in subpopulation 1 was 0.4264; within the individuals in subpopulation 2 was 0.3901 while 0.3783 was computed for sub-population 3. The population structure analysis classified the population into sub-populations based on thepeak value of ∆K at K = 3 (Fig. 5). Majority of the germplasm lines containing high and very high estimates of the biochemical traits were found in the subpopulation 3 (SP 3; Blue color) while moderate value carrying germplasm lines were in the subpopulation 2 (SP 2; Green color). The germplasm lines with low and very low in the six biochemical traits were found in the sub-population 1 (SP 1; Red color). The alpha value estimated by the structure software at K = 3 for the panel population was very low (alpha = 0.046). The alpha-value showed a leptokurtic distribution for the panel population while the Fst values of each sub-population were distributed almost symmetrically at K = 3 ( Supplementary Fig. 2).
The cluster analysis grouped the genotypes on the basis of genotyping results using 136 SSR markers data and placed the germplasm lines into different clusters which showed correspondence with the studied biochemical traits in the germplasm lines. The UPGMA tree differentiated the genotypes into traits in the 4 different clusters (Fig. 6). The clusters accommodated various germplasm lines as per the structure sub-population and majority of the germplasm lines were in subpopulation 1 depicted in blue color in the tree (Fig. 6). The admix type germplasm lines of the population are depicted in brick red color in the neighbour joining tree while the members of the subpopulation 2 are in pink color (Fig. 6).

Molecular variance (AMOVA) and LD decay plot analysis
The members present in a sub-population show similarity among themselves for various traits of the population. The analysis of molecular variance (AMOVA) was performed in a population to know the genetic variations present within and between the sub-populations ( Table 2). The genetic variations estimated considering the K value at K = 3 and computed to be 8% among the populations, no variation among individuals and 92% within the individuals of the panel population. The deviation from Hardy-Weinberg's prediction was checked from the estimates of Wright's F statistics. The uniformity of individuals within a sub-population was checked using the F IS parameter estimated for the differentiation of the sub-populations while the statistics, F IT was used to know the variation of individual within the total population for the differentiation in a population. The estimates of F IT and F IS of the total population and within population were  Fig. 2). The association of alleles is dependent on the existence of traits in LD in a population for utilization of markertrait association. Continuance of marker-trait association in a populationis dependent on the LD decay rate over a time period. The existence of different inferred value in a germplasm line may depend on the LD decay rate in a population. New admix type will indicate the possibility of new genes or allelic variants for the target traits in a population. The LD plot was constructed using the syntenic r 2 value in a population versus the markers physical distance in million base pair to know the trend of linkage disequilibrium decay in the population (Fig. 7A). The tightly linked markers showed higher r 2 value and the average r 2 values decreased rapidly for the increase in linkage distance. The LD plot revealed that the decay was delayed in the beginning in the panel population for the studied traits. The LD decay was declined for the associated markers in the curve at about 1-2 M base pair and thereafter a very slow and gradual decrease was noticed from the plot. It clearly revealed the continuance of linkage disequilibrium decay in the population for the studied six biochemical traits. The estimate of LD decay may be influenced under the situation of mutation,  non-random mating, selection, migration or admixture, and genetic drift. The clue for creation of genetic admixture groups in the population for various biochemical traits is indicated from the LD decay plot. The plot of marker 'P' versus marker 'F' and marker r 2 also showed a similar trend in the curve (Fig. 7B). The associated markers detected from this analysis provided the strength of the markers for use in the improvement programs of biochemical traits.

Principal coordinates and cluster analyses for genetic relatedness among the germplasm lines
The principal coordinate analysis (PCoA) in the two dimensional plot was constructed based on the marker data of genotyping results using 136 SSR markers that grouped the 120 panel germplasm lines on their genetic relatedness among the members (Fig. 8A). The inertia for component 1 was 11.59% while component 2 showed 7.49%. The genotypes were grouped in the four different quadrants making 2 major and 2 minor groups (Fig. 8A). The biggest group accommodated almost all the germplasm lines of the subpopulation 1 carrying low quantity of biochemical traits and depicted in blue color. The quadrant I and II formed a group and accommodated majority of sub-population 3 carrying high estimates of the biochemical traits. The members of the sub-population 2 were present in the quadrant III (bottom left) in pink color. The admix types are present in the quadrant II and III and depicted in brick red color (Fig. 8A). The un-rooted tree is reared using phylogenetic tree. The tree indicates no common ancestor or node in the tree. The germplasm lines containing high to very high estimates of biochemical traits are grouped together forming the sub-population 3. This group is depicted in green color in the un-rooted tree (Fig. 8B). The variations can easily be assessed among the landraces from the distance of each landrace depicted in the tree (Fig. 8B). The relationship is estimated in both the trees here without considering the evolutionary time of the landraces.

Marker-trait associations with biochemical traits in rice
Association of six biochemical traits with molecular marker was performed using TASSEL 5 software adopting the GLM and MLM approaches. The associations were detected at both < 1 and < 5% error. The six traits viz., total protein content, total soluble sugars, starch, amylose, chlorophyll a and chlorophyll b content were detected to be above the threshold level and found to be associated with the SSR markers using the GLM and MLM approaches (Table 3). While analyzing by model GLM at 5% level, 200 markers-traits associations were observed. But, 60 markers-traits associations were detected by GLM analysed at < 1% error (Supplementary Table 5). The analysis by MLM approach showed 110 associations at < 5% error while 26 associations were detected at < 1% level (Supplementary Table 6). However, while considering both GLM and MLM approaches at < 1% error level, 21 significant marker-trait associations were detected. Three significant marker-trait associations for each of Chl. a, Chl. b and starch content were detected while 4 associations were computed for TSS, TP, amylose content by both the models (Table 3; Fig. 9A). The markers detected by association study by both GLM and MLM approaches are considered as robust markers. The generated Q-Q plot also confirmed the association of the markers with 6 biochemical traits in rice (Fig. 9B).
Chlorophyll a and Chlorophyll b content showed significant association with 3 markers each analyzed by both the models. The associations of the SSR markers RM1347, RM405 and RM3231 with Chl. a are located on chromosome 2, 5 and 8 at 82, 109 and 363 cM positions, respectively. The trait, Chl. b showed significant association with the markers RM440, RM5436 and RM3231 ( Table 3). The starch content showed association with the markers RM3701, RM20377 and RM6374. Amylose content showed association with the markers, RM3701, RM315, RM167 and RM6091 analyzed by TAS-SEL using both the models. Four markers namely RM556, RM220, RM5638 and RM253 showed associations with protein content estimated from the panel population. The QTL for Chl. a and Chl. b on chromosome 8 at 363 cM position are detected to be co-localized showed association with the marker, RM3231. Another two QTL on chromosome 8 at position 234 cM controlling protein and total soluble sugars content were found to be  Table 1. The colors are SP1: blue; SP2: pink; SP3: green and Admix: brick red on the basis of sub-populations obtained from structure analysis co-localized. Similarly, the traits starch and amylose content were significantly associated with marker, RM3701 and detected to be closely located on chromosome 11 at 48 cM position.

Discussion
Protein, starch, amylose and total soluble sugars are basic metabolites of seed that influence the eating, cooking and nutritional qualities of rice. Chlorophyll is responsible for the absorption and exploitation of the light energy influencing photosynthetic efficiency in rice. The results of the study showed wide genotypic variation among the germplasm lines for protein, starch, amylose, total soluble sugars and chlorophyll content in the mapping population and hence the developed panel was effective for mapping of the target traits. The donor line in earlier publications for grain protein content containing 16.41% was reported in the germplasm line, ARC10063 [2,42]. In this investigation, another landrace, Bharati showed protein content of 18%. This landrace will serve as a potential donor for protein improvement programs. The employed markers showed high PIC, gene diversity and specific alleles value in the panel population indicated a diverse panel population. Many earlier results also report high genetic diversity parameters in various rice populations [43][44][45][46][47][48]. The landraces studied in the present investigation were collected from the locations of five states known for rich genetic diversity in rice including the secondary centre of origin [49][50][51][52]. Hence, the panel population is effective for mapping of the six biochemical traits of rice.
The population genetic structure categorized the panel population into three sub-populations. The structure correlation and grain protein content in rice was reported earlier by Pradhan et al. [2]. But, population structure analysis for starch, amylose, total soluble sugars and chlorophyll using rice landraces are not available. However, structure correlation with phenotype in rice has been reported by many researchers [53][54][55][56][57][58]. Detection of many admix type landraces in the population revealed clue for evolution of the traits from different germplasm lines during the evolution process. This is also clear from the existence of many groups and subgroups in the population (Figs. 4 and 5).
The total protein content estimated from each germplasm lines from the panel showed significant associations by both the models with RM556, RM220, RM5638 and RM253. RM 220 is located on chromosome 1 at 240 cM position showing about 0.06 marker r 2 value detected by both the models (Fig. 9A). The mapping results of Kinosita et al. [59] and Jang et al. [20] reported protein controlling QTL on chromosome 1 but quite away from the QTL detected in the present investigation. Hence, this detected QTL is not reported in earlier studies and designated as qPC1.2. RM 5638 is also present on Chromosome 1 at 20.9 Mb position with about 0.07 marker r 2 value. The mapping results of Aluko et al. [60], Yang et al. [61] and Kinosita et al. [59] reported QTL for controlling protein content located on chromosome 1 at ~ 21-38 Mb which is closer to qProt1 reported by Terao and Hirose [62]. The present investigation detected a protein controlling QTL in the same region. Therefore, the previously detected QTL, qProt1 is validated in this study and will be useful in marker-assisted breeding program for protein content enhancement. The marker, RM253 is present on chromosome 6 at 5.4 Mb position showing marker r 2 value of > 0.06 by both the models. Kinosita et al. [59] reported the QTL, qPC6.2 in between marker interval position 5.2-9.7 Mb. In the present investigation we detected a QTL within this marker position similar to Kinosita et al. [59]. Therefore, the previously detected QTL, qPC6.2 is validated in this study and will be useful in marker-assisted breeding program for protein content enhancement. RM556 is present on chromosome 8 at  [63] is validated in this study. The QTL was not assigned any designation by Yun et al. [63] and hence the QTL is designated as qPC8.2 (Fig. 9A). Significant marker-trait associations for total soluble sugar were detected to be associated by the markers RM247, RM337, RM248 and RM566 through analyzing by both GLM and MLM approaches. In a mapping study by Yang et al. [61], reported QTL for the total soluble sugar, qSS8.1 at the marker interval of RM1235-RM1376 in the region 25-30 cM position. In our study, RM337 located at 27 cM on chromosome 8 was associated strongly and controlled total soluble sugar in the population. The QTL, qTSS8.1 reported by Yang et al. [61] is validated in this present study and will be useful for total soluble sugars improvement programs in rice. No genes or QTL were reported in previous studies for total soluble sugar detected on chromosomes 7, 8 and 12 at 157, 236 and 23 cM position. These QTL are designated as qTSS7.1, qTSS8.2 and qTSS12.1 , respectively. The marker, RM248 showed very high marker r 2 value of > 0.12 with total soluble sugars analyzed by both the Fig. 9 A. The positions of the QTL on the chromosomes for chlorophyll a, chlorophyll b, starch, amylose, total protein and total soluble sugars B. Distribution of marker-trait association and quantile-quantile (Q-Q) plot generated from Mixed Linear Model analysis for the six biochemical traits detected by association mapping at p < 0.01 in rice models and present on chromosome 7 at 157 cM position. The marker-trait associations were detected by both the models (GLM and MLM) at p < 0.01, low p value, high r 2 value (Table 3) and Q-Q plot also confirmed the associations ofthese markers (Fig. 9B). These strongly associated SSR markers RM248, RM566 and RM247 for total soluble sugars trait may be useful for marker-assisted program in improving total soluble sugars in rice.
In our investigation, the amylose content is detected to be significantly associated with the marker, RM315 on chromosome 1 at 92 cM position. Li et al. [64] reported a QTL, qAC1.1 for amylose content on chromosome 1 but at 40 cM position. Zheng et al. [65] reported a QTL for amylose content as qAC1.2 at 102.5 cM in the marker interval of C904-R2632. In the study of Swamy et al. [66] reported the QTL, qac 1.1 for amylose content at position 60-90 cM in marker interval of RM243-RM582 on chromosome 1. Also, Swamy et al. [66], reported qac1.2 and gel-2 QTL for amylose content and gel consistency in marker interval of RM580-RM81 at the position 90-100 cM on chromosome 1. Thus, these reports validated for the QTL qAC1.2 for amylose content on chromosome 1. The QTL, qAC11.1 has been detected on chromosome 11 for amylose content at 27 cM with RM6091 showing r 2 value 0.1099. This QTL is reported in earlier findings of Lee et al. [14] which is validated in this study. The QTL detected using both the models for amylose content on chromosome 11 at 123 cM and 304 cM by the associated markers RM167 and RM6091, respectively. No reports are available fordetection of these two QTL controlling amylose content at positions 48 cM, 123 cM and 304 cM on chromosome 11. Hence, the three QTL may be novel QTL and designated as qAC11.1, qAC11.2 and qAC11.3 (Fig. 9A) .
In this investigation, marker RM6374 was associated with the trait, seed starch content showing r 2 value of 0.06383 on chromosome 2 at 249 cM. Panahabadi et al. [22] reported qSTh2.1 for starch content on the same chromosome but at a position of 4.04 cM. This confirms qSC2.1at 247 cM and qSC2.2 at 249 cM as novel QTL for seed starch content. Two QTL detected on chromosome 6 and 11 at 212 and 48 cM position were detected for starch content by analyzing in both the models (Table 3; Fig. 9A). No previous reports are available for starch controlling QTL at these positions. These two QTL may be novel and designated as qSC6.1 and qSC11.1 . Yang et al. [14] reported ALK gene as starch synthesis gene at 12.9 cM on chromosome 6 with marker RM8200.
Chlorophyll a content is significantly associated with the SSR markers RM1347, RM405 and RM3231 located on chromosome 2, 5 and 8 at 82, 109 and 363 cM positions, respectively. The QTL, qCH2 for Chlorophyll a on chromosome 2 was reported earlier by Kun et al. [67] within marker interval of RM327-RM123 at 80-95 cM. We also detected the QTL at 82 cM position. Therefore, the QTL, qCH2 is validated in this mapping study and will be useful in chlorophyll improvement program in rice. However, no QTL for chlorophyll content was reported on chromosome 8 at 363 cM position. This detected QTL may be a new QTL and designated as qChla8.1. The QTL detected on chromosome 5 was located at 109 cM position. Ye et al. [17] reported a QTL for chlorophyll content in the interval of 110.46-118.71 cM region on the chromosome 5. The detected QTL may be the same QTL, qSLCHH reported by Ye et al. [17]. The trait, chlorophyll b showed significant association with the markers RM440, RM5436 and RM3231 located on chromosomes 5, 7 and 8 at 67, 136 and 363 cM positions, respectively. Zhang [68] reported a QTL, qChlb 5D controlling chlorophyll on chromosome 5 at 68.2 cM position. The QTL detected by us at 67 cM may be the same QTL and hence qChlb 5D is validated in this mapping population. The other two QTL detected for this trait were not reported earlier at these locations and designated as qChlb7.1 and qChlb8.1 (Table 3; Fig. 9A).
The QTL, qChla8.1 and qChlb8.1 for Chl. a and Chl. b on chromosome 8 at 363 cM position were co-localized and located very closely. Another two QTL, qPC8.2 and qTSS8.2 on chromosome 8 at position 234 cM controlling protein and total soluble sugars content were found to be co-localized. Similarly, the QTL, qSC11.1 and qAmy11.1 starch and amylose content are significantly associated with marker, RM3701 and detected to be closely located at 48 cM position on the chromosome 11 (Table 3). This indicates that these pairs of characters will be inherited together to the progenies. In addition, these pair of traits showed strong positive correlation and hence easy for improvement in the breeding programs. Similar findings were reported in earlier mapping studies for high temperature tolerance, protein, iron, zinc content, iron toxicity tolerance, seedling vigour and antioxidant content in rice [2,38,45,[68][69][70].

Conclusion
A wide genetic variation for protein, starch, amylose and total soluble sugars and chlorophyll content were observed in the germplasm lines used for association study. The prospectus donor lines carrying higher content of these biochemical traits were identified. The STRU CTU RE software classified the representative population into 3 genetic structure groups. Specific allele frequency, gene diversity, informative markers and other diversity parameters estimated from the population in the panel population using 136 SSR markers. Various groups and sub-groups obtained from the population showed relationship within the members for their biochemical traits. Linkage disequilibrium